Implementing Low Latency Distributed Software-Based Shared Memory
نویسنده
چکیده
Software-implementations of shared memory are still far behind the performance of hardware-based shared memory implementations (HW-DSM) and are not viable options for most fine-grain shared memory applications. The major source for their inefficiency comes from the cost of interrupt-based asynchronous protocol processing, not from the actual network latency. As the raw hardware latency of inter-node communication decreases, the asynchronous overhead in the communication becomes more dominant. We describe how all the interruptand/or poll-based asynchronous protocol processing can be completely removed by running the entire coherence protocol in the requesting processor. This not only removes the asynchronous overhead, but also makes use of a processor that otherwise would stall. The technique is applicable to both page-based and fine-grain software-based shared memory. DSZOOM-WF—the implementation presented in this paper—is a sequentially consistent, fine-grain distributed software-based shared memory. It demonstrates a protocol-handling overhead below a microsecond for all the actions involved in a remote load operation, to be compared to the fastest implementation to date of around ten microseconds. The all-software protocol is implemented assuming some basic low-level primitives in the cluster interconnect and an operating system bypass functionality, similar to the emerging InfiniBand standard. DSZOOM-WF demonstrates consistently comparable performance to HW-DSM implementations.
منابع مشابه
Coherence-Centric Logging and Recovery for Home-Based Software Distributed Shared Memory
The probability of failures in software distributed shared memory (SDSM) increases as the system size grows. This paper introduces a new, efficient message logging technique, called the coherence-centric logging (CCL) and recovery protocol, for home-based SDSM. Our CCL minimizes failure-free overhead by logging only data necessary for correct recovery and tolerates high disk access latency by o...
متن کاملSystem Software Support for Reducing Memory Latency on Distributed Shared Memory Multiprocessors
This paper overviews results from our recent work on building customized system software support for Distributed Shared Memory Multiprocessors. The mechanisms and policies outlined in this paper are connected with a single conceptual thread: they all attempt to reduce the memory latency of parallel programs by optimizing critical system services, while hiding the complex architectural details o...
متن کاملThe Efeect of Contention on the Scalability of Page-Based Software Shared Memory Systems
We demonstrate the profound effects of contention on the performance of page-based software distributed shared memory systems, as such systems are scaled to a larger number of nodes. We argue that applications that suffer from increases in memory latency due to contention and imbalances in protocol load scale poorly Furthermore, we show that there is a relationship between protocol imbalance, c...
متن کاملComparing Latency-Tolerance Techniques for Software DSM Systems
This paper studies the isolated and combined effects of several latency-tolerance techniques for software-based distributed shared-memory systems (software DSMs). More specifically, we focus on data prefetching, update-based coherence, and single-writer optimizations for page-based software DSMs. Our experimental results with 6 parallel applications show that when these techniques are carefully...
متن کاملPerformance and Energy Evaluation of Memory Organizations in NoC-Based MPSoCs under Latency and Task Migration
This chapter presents a study on the performance and energy consumption arising from distinct memory organizations in an NoC-based MPSoC environment. This evaluation considers three sets of experiments. The first one evaluates the performance and energy efficiency of four different memory organizations in a situation where a single application is executed. In the second experiment, a traffic ge...
متن کامل